Performance improvements #445

fmd · 2020-04-29T20:02:38Z

In short, this PR makes helmsman call helm less.

If we have 100 apps, and they are all on the same version of the same chart, helmsman would validate that chart 100 times in a row with identical calls to helm. This doesn't need to happen. I kept the idea of running helm commands concurrently a resourcePool number of times (more on resourcePool vs the -p flag later in this comment), but I have expanded it -- we concurrently ask helm about the charts (like we used to), but we only do so once per chart version pair (at least in makePlan).

Compared to my previous PR, I am no longer trying to cache chart names and versions in an atomic sync.Cache. Since we went to the OS and called helm a bunch of times concurrently, they'd all check the cache and see the result wasn't there, then all the resources in the pool would attempt to fill the cache by calling the expensive helm command at the same time. It was still faster (because I decoupled validateChart from the release), but this is faster still.

I've updated the Dockerfile quite a bit too, I don't know if it'll pass CircleCI.

For our use case these changes seem to roughly double the speed of our deploys on helmsman. However, I haven't done any profiling (just a few very quick eyeball tests) and I haven't written any tests that test when there's lots of releases on a mixture of charts.

resourcePool now applies slightly differently to makePlan. I suggest we combine resourcePool with the -p flag so we can set the resourcePool number ourselves and have it apply consistently, but another time I want to look at how the concurrency on helm commands is performing across the board. Some helm commands still aren't run concurrently (like r.checkChartDepUpdate() which I haven't looked at yet), and i have no data on what kind of resourcePool number might be optimal, but this is a start.

This could definitely be DRYed up and refactored but I think these changes should provide some inroads into making helmsman more performant, especially when someone is deploying a lot of apps that share charts (as we are).

fmd · 2020-04-30T11:20:21Z

Even with these changes, it's still a lot of calls to helm. There must be a way of doing this that minimises calls to helm further by design, but I'm quite busy so I can't think of it yet

fmd · 2020-04-30T11:29:51Z

luisdavim · 2020-05-01T17:46:30Z

internal/app/decision_maker.go

+	versionsC := make(chan [4]string, len(s.Apps))
+
+	// We store the results of the helm commands
+	extractedChartNames := make(map[string]string)


probably using a struct for the extracted chart names and versions would be cleaner but I guess we can do that in another iteation

mkubaczyk

I went through your code with debugger to make sure I correctly get your whole idea.
I really like what you did with Dockerfile. This should be even separate PR so it goes quicker.
There are really as you said in the code's comments places that violates DRY (not that we were not doing this before), but let's push it and I hope at some point this is getting fixed too. We just don't want anyone to lose motivation. :-)
Looks good overall, but due to code's repeat it complicated the reading much and at some point I was concerned. Will you please take a look at comments I left before we can go with it any further?

mkubaczyk · 2020-05-08T07:08:20Z

internal/app/helm_helpers.go

@@ -53,6 +53,8 @@ func getHelmVersion() string {
 	if result.code != 0 {
 		log.Fatal("While checking helm version: " + result.errors)
 	}
+
+	log.Verbose("Helm version " + result.output)


there's no need to log that since the helm version is printed in cli.go:152

internal/app/release.go

fmd · 2020-05-11T07:53:35Z

@mkubaczyk i've seen your comments, i'll address them and clean this up in a day or so

mkubaczyk · 2020-05-11T15:33:11Z

Please do. We are waiting with the release for your changes to join it as well. :-)

…ion pairs to validate

fmd · 2020-05-12T13:51:50Z

@mkubaczyk fixed your comments.

I want to cache the downloaded tars that Docker curls somehow, because at the moment it pushes new layers on every build which is wasting about the same amount of space on our hub as the old Dockerfile, but like you said it might not be too late to split out the Dockerfile changes into a new PR. Maybe we split the helm-installer part of the multistage build into its own Dockerfile that's ran with a different make command? I'd like to find a way to make it only have to curl the kubectl/helm/plugins tar once and then cache it for future builds. I don't want to keep pushing new layers with the same binary to the hub. What I've done here is a step towards that, removing the script was good I think, but we're not fully there.

internal/app/decision_maker.go

fmd · 2020-05-14T13:35:06Z

internal/app/helm_helpers.go

@@ -25,8 +25,8 @@ func helmCmd(args []string, desc string) command {
 }

 // extractChartName extracts the Helm chart name from full chart name in the desired state.
-func extractChartName(releaseChart string, version string) string {
-	cmd := helmCmd([]string{"show", "chart", releaseChart, "--version", version}, "Show chart information for version "+version)
+func extractChartName(releaseChart string) string {


I reverted this change -- we no longer use this when we already have a version

fmd · 2020-05-14T16:41:15Z

@mkubaczyk i've addressed your feedback and it looks like it passes the build again

mkubaczyk · 2020-05-19T08:14:29Z

good, thanks! I'm merging it :-)

fmd · 2020-05-19T09:45:55Z

@mkubaczyk a point of note: we should (I can, in a new PR) update the Dockerfile again and cache the initial container somehow, but only when we update the versions -- it would vastly improve the Docker layering to do so

mkubaczyk · 2020-05-19T09:47:22Z

@fmd whatever you feel is right for the codebase, please propose it with MR. We love to see new things coming into the repo :-)

Performance improvements

fmd force-pushed the cache-fixes branch from ea79a1b to 3152bc7 Compare April 29, 2020 20:40

fmd changed the title ~~Performance improvements: Redux (WIP)~~ Performance improvements Apr 30, 2020

fmd mentioned this pull request Apr 30, 2020

Chart performance improvements #442

Closed

luisdavim reviewed May 1, 2020

View reviewed changes

mkubaczyk requested changes May 8, 2020

View reviewed changes

zopa-fareed-dudhia added 5 commits May 12, 2020 14:38

reduce number of helm commands for validation to number of chart/vers…

5a26151

…ion pairs to validate

cache some helm command results

002e01a

fix appVersion

5898e6c

fix release performancce improvements

0f62379

fix pr feedback and restrict plan to targetted apps

3eb2a47

fmd force-pushed the cache-fixes branch from 45ab5c1 to 3eb2a47 Compare May 12, 2020 13:42

fix variable name

09dae83

mkubaczyk requested changes May 13, 2020

View reviewed changes

internal/app/decision_maker.go Outdated Show resolved Hide resolved

zopa-fareed-dudhia and others added 2 commits May 13, 2020 12:09

revert target for plan

9035720

Merge branch 'master' into cache-fixes

1b7b45e

fmd commented May 14, 2020

View reviewed changes

fix missing quote

0158490

mkubaczyk approved these changes May 19, 2020

View reviewed changes

mkubaczyk merged commit 03cbd2d into Praqma:master May 19, 2020

mehyedes mentioned this pull request Jul 6, 2020

gpg and sops are not available anymore in the Docker image #500

Closed

mkubaczyk added a commit that referenced this pull request Aug 18, 2023

Merge pull request #445 from fmd/cache-fixes

f201f8b

Performance improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #445

Performance improvements #445

fmd commented Apr 29, 2020 •

edited

Loading

fmd commented Apr 30, 2020

fmd commented Apr 30, 2020 •

edited

Loading

luisdavim May 1, 2020

mkubaczyk left a comment

mkubaczyk May 8, 2020

fmd commented May 11, 2020

mkubaczyk commented May 11, 2020

fmd commented May 12, 2020

fmd May 14, 2020

fmd commented May 14, 2020

mkubaczyk commented May 19, 2020

fmd commented May 19, 2020

mkubaczyk commented May 19, 2020

Performance improvements #445

Performance improvements #445

Conversation

fmd commented Apr 29, 2020 • edited Loading

fmd commented Apr 30, 2020

fmd commented Apr 30, 2020 • edited Loading

luisdavim May 1, 2020

Choose a reason for hiding this comment

mkubaczyk left a comment

Choose a reason for hiding this comment

mkubaczyk May 8, 2020

Choose a reason for hiding this comment

fmd commented May 11, 2020

mkubaczyk commented May 11, 2020

fmd commented May 12, 2020

fmd May 14, 2020

Choose a reason for hiding this comment

fmd commented May 14, 2020

mkubaczyk commented May 19, 2020

fmd commented May 19, 2020

mkubaczyk commented May 19, 2020

fmd commented Apr 29, 2020 •

edited

Loading

fmd commented Apr 30, 2020 •

edited

Loading